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REMARKS 



This document is filed in reply to the final office action dated August 30, 2004 ("Office 
Action"). 

Applicants have amended the specification to insert two sequence identifiers "SEQ ID 
NO: 12" and "SEQ ID NO: 13." The sequence of SEQ ID NO: 12, i.e., nucleotides 81889 - 
83238 of E. coli genome (GenBank Accession No. AP002562), was presented in the substitute 
sequence listing filed with the last response dated June 29, 2004. SEQ ID NO: 13 corresponds to 
nucleotides 768-974 of SEQ ID NO: 12 and represents the open reading frame ECS3459. 1 A 
substitute sequence listing is submitted herewith to include the sequence of SEQ ID NO: 13. 

Applicants have amended claims 8 and 36, drawn to nucleic acids, to specify that the 
nucleic acid of claim 8 and the third nucleic acid covered by claim 36 specifically hybridize to 
SEQ ID NOs: 13 and 12, respectively. Support for "specifically hybridize to" SEQ ID NO: 13 or 
12 can be found in Example 5, page 11, line 20 through page 13, line 3 of the Specification. 
Finally, Applicants have amended claims 1,8, 15, and 36-39 to promote clarity. No new matter 
has been added. 

The amendments should be entered as they raise no new issues that will require 
further consideration or search and also do not touch the merits of the application within 
the meaning of 37 C.F.R. § 1.116(b). 

Claims 1-15 and 23-43 are pending. Claims 27-35 have been withdrawn from further 
consideration for being drawn to anon-elected invention. Claims 1-15, 23-26, and 36-43 are 
now under examination. Reconsideration of this application is requested in view of the 
following remarks: 

Rejection under 35 U.S.C. § 1 12, second paragraph 

The Examiner rejected claims 1-15, 23-26, and 36-43 for indefiniteness. See the Office 
Action, page 7, last paragraph. In view of the above amendments, Applicants submit that the 
rejection has been overcome. 



1 The open reading frame ECs3459 corresponds to nucleotides 82656-82963 of GenBank Accession No. AP002562 
(SEQ ID NO: 12). See Exhibit A attached hereto. In other words, it corresponds to nucleotides 768-974 of SEQ ID 
NO: 12. 
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Rejection under 35 U.S.C. § 1 12, first paragraph 

The Examiner rejected claims 8-14 and claims 36-38 for failing to comply with the 
written description requirement. Applicants have amended claims 8 and 36, and will discuss 
claim 8 first. 

Claim 8 covers a nucleic acid amplified from an E. coli nucleic acid template with a pair 
of primers containing SEQ ID NOs: 1 or 3 and SEQ ID NO: 2 or 4, respectively. In the office 
action dated April 1, 2004, the Examiner rejected the claim for not meeting the written 
description requirement, alleging that "[w]ith the exception of the recited SEQ ID NOs, the 
skilled artisan cannot envision the detailed chemical structure of the encompassed polynucleotide 
. . . (emphasis added)" In the last response dated June 29, 2004, Applicants amended claim 8 to 
recite "the nucleic acid contains E. coli open reading frame ECs3459" to include "a detailed 
chemical structure of the encompassed polynucleotide." 

However, the Examiner maintained the rejection on the ground that "the specification has 
not taught what the sequence of ECs3459 is." See the Office Action, page 7, lines 5-7. 

Applicants disagree. In fact, the Specification teaches that "the nucleotide sequence 
between 81889-83238 of GenBank Accession No. AP002562 (SEQ ID NO: 12) contains ... 
ECs3459 ..." See the Specification, page 3, lines 3-7. As the Specification discloses the entire 
sequence of SEQ ID NO: 12, it also teaches the sequence of ECs3459, a segment of SEQ ID NO: 
12. In the sole interest of moving this case toward allowance, Applicants (1) have included the 
nucleotide sequence of ECs3459, i.e., SEQ ED NO: 13, in the enclosed substitute sequence 
listing, and (2) have inserted "SEQ ED NO: 13" in the both Specification and claim 8. 

The Examiner also countered that "even if the specification had taught the specific 
sequence of ECs3459, the 'detailed chemical structure of the encompassed polynucleotides' as 
asserted by the response would not limit the genus of claimed nucleic acids to exclude unknown, 
uncharacterized variants and homolog[ou]s ..." See the Office Action, page 7, lines 7-10. 
Referring to the decision by the University of California v. Eli Lilly & Co. court, the Examiner 
further stated that "[t]he claim(s) contains subject matter which was not described in the 
specification in such a way as to reasonably convey to one skilled in the relevant art that the 
inventor(s), at the time the application was filed had possession of the claimed invention." See 
the Office Action, page 3, lines 7-10; and page 5, line 1 1 through page 6, line 19. 
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Applicants would like to remind the Examiner of "Guidelines for the Examination of 
Patent Applications Under the 35 U.S.C 1 12, para. 1, 'Written Description' Requirement" 
shown in MPEP 2163 ("Guidelines"). Referring to the same court's decision, the Guidelines 
provides the following: 

The written description requirement for a claimed genus may be satisfied through 
sufficient description of a representative number of species by actual reduction to 
practice .... See Eli Lilly, 119 F.3d at 1568, 43 USPQ2d at 1406. . . .What 
constitutes a "representative number" is an inverse function of the skill and 
knowledge in the art. Satisfactory disclosure of a "representative number" 
depends on whether one of skill in the art would recognize that the applicant was 
in possession of the necessary common attributes or features of the elements 
possessed by the members of the genus in view of the species disclosed . 

See, MPEP 2163.11 A.3.ii (emphasis added). 

Applicants note that the Specification describes 64 nucleic acids amplified from as many 
as 64 strains of E. coli "by actual reduction to practice." See, e.g., page 9, Table 2, rows 3-8; and 
page 8, lines 3-15. These amplified nucleic acids are covered by claim 8. In view of these 64 
species, one of skill in the art would recognize that Applicants were in possession of the 
necessary common attributes or features of the elements possessed by the members of the genus, 
such as containing open reading frame ECs3459 (SEQ ID NO: 13) and specifically hybridizing 
to SEQ ID NO: 13 or its complement. Thus, the 64 species constitute a "representative number 
of species" and the written description requirement for the claimed genus is satisfied. 

In this connection, Applicants would like to point out that the entire genome sequences of 
these E. coli stains were already determined. See, e.g., Hayashi et al., DNA Research 8, 1 1-22 
(2001) (copy attached hereto as "Exhibit C") and Exhibit A. Accordingly, in view of the 
teachings from the specification, one skilled in the art could readily obtain the amplified 
sequences and recognize (1) "the detailed chemical structure of the encompassed 
polynucleotides," and (2) Applicants' "possession of the necessary common attributes or features 
of the elements possessed by the members of the genus in view of the species disclosed." 

To move this case toward allowance, Applicants have amended claim 8 to point out that 
the claimed nucleic acid specifically hybridizes to SEQ ID NO: 13 or its complement. 
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Applicants also would like to remind the Examiner of "Synopsis of Application of 
Written Description Guidelines" (www.uspto.gov/web/menu/written.pdf; "Synopsis"). 

Example 9 of the Synopsis illustrates a hypothetical situation that mirrors the present 
case. Just as in Example 9, claim 8 is drawn to a "nucleic acid that hybridizes to SEQ ID NO: 
[13 or its complete complement] under highly stringent conditions and encodes [ECs3459]. Just 
as in Example 9, "[t]he claim is drawn to a genus of nucleic acids all of which must hybridize 
with SEQ ID NO: [13] and must encodes [ECs3459] " Furthermore, just as in Example 9, 
"[t]here is single species disclosed ([SEQ ID NO: 12]) that is within the scope of the claimed 
genus" and "[t]here is actual reduction to practice of the disclosed species." Example 9 then 
provides the following guidance to examiners: 

"Now turning to the genus analysis, a person of skill in the art would not expect 
substantial variation among species encompassed within the scope of the claim because 
the highly stringent hybridization conditions set forth in the claim yield structurally 
similar DNAs. Thus, a representative number of species is disclosed, since highly 
stringent hybridization conditions in combination with the . . . function of DNA and the 
level of skill and knowledge in the art are adequate to determine that applicant was in 
possession of the claimed invention. 

Conclusion: The claimed invention is adequately described." 

In view of the very clear instructions from the Synopsis, Applicants submit that the 
claimed invention is adequately described on this independent ground. 

For the reasons set forth above, claim 8 meets the written description requirement. 
Claims 9-14, dependent from claim 8, specify the sequences or lengths of the primers recited in 
claim 8. By the same token, they also meet the written description requirement. 

The Examiner also rejected claims 36-39, drawn to a set of nucleic acids that include a 
third nucleic acid, on the same grounds discussed above. Applicants have amended claim 36 to 
specify that the third nucleic acid must specifically hybridize under highly stringent conditions to 
SEQ ID NO: 12 or its complete complement. For the same reasons set forth above, claim 36 also 
meets the written description requirement. So do claims 37-39, which depend from claim 36 and 
further specify the length of the third nucleic acid. 
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Rejection under 35 U.S.C. § 102(b) 

The Examiner rejected claim 15, which recites "SEQ ID NOs.: [5 and 6] and sequences 
complementary thereto" for being anticipated by GenBank Accession Nos. AF175847 and 
AX002476. See the Office Action, page 9, lines 6 and 7, and lines 15 and 16. It is the 
Examiner's position that the recitation "sequences complementary thereto" does not "limit the 
claim to sequences that contains the complete complement of SEQ ID NO: [5 or 6]." 

Applicants have amended claim 15 to recite "complete complement," and submit that, 
claim 15, as amended, is not anticipated by the two GenBank Accession Nos. 

Rejection under 35 U.S.C. § 103(a) 

The Examiner rejected claims 1-3, 5, 6, 8-15, and 36-39 for obviousness over GenBank 
Accession No. AE005490, GenBank Accession No. AE000346, GenBank Accession No. 
Z70523, and GenBank Accession No. D90887, in view of Buck et al. Biotechniques, 1999, 
27(3): 528-536 ("Buck"), U.S. Patent 5,374,718 to Hammond et al. ("Hammond"), U.S. Patent 
5,693,769 to Hogan ("Hogan"), and Tijhie et al., J. Micobiol. Meth. Vol. 18, pp 137-150, 1993 
("Tijhie"). See the Office Action, the paragraph bridging pages 10 and 11. 

Applicants respectfully traverse and will discuss independent claim 1 first. Claim 1 
covers a set of nucleic acids that include a pair of primers, containing SEQ ID NO: 1 or 3 and 
SEQ ID NO: 2 or 4, respectively. According to the Examiner, (i) the 4 GenBank Accession Nos. 
teach sequences that cover the primer/probe SEQ ID NOs recited in the rejected claims; (ii) 
Hommond and Tijhie teach picking primers or probes for detection of Chlamydia pneumonia, 

(iii) Hogan teaches targeting sequences within the E. coli genome for detections of E. coli, and 

(iv) Buck supports that all nucleic acids selected from the prior art sequences would be expected 
to function as primers. The Examiner then proceeded to conclude that it would have been 
obvious to one skilled in the art to combine all of the cited references and to select PCR primers 
from the sequences described therein to generate the claimed nucleic acid set. 

The Examiner appeared to believe that the cited references suggest a genus of nucleic 
acid primers encompassing the primers recited in claim 1, thereby rendering the claim at issue 
obvious. Applicants disagree and would like to remind the Examiner that "[t]he fact that a 
claimed species or subgenus is encompassed by a prior art genus is not sufficient by itself to 
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establish a prima facie case of obviousness. . . . Some motivation to select the claimed species or 
subgenus must be taught by the prior art." (MPEP 2144.08). Applicants note that even if one 
skilled in the art would have combined the cited references, he or she would have to choose the 
recited primers from an astronomical number of candidates. Since the cited references do not 
provide "[any] motivation to select the claimed species or subgenus" from the enormous number 
of candidates, they do not render the claims obvious. 

The Examiner also appeared to base her conclusion on the assumption that all primers 
select from the prior art sequences would be operational, i.e., to amplify contemplated products. 
Applicants disagree. In fact, as shown in Declaration by Dr. Bair ("Exhibit B"), a pair of primers 
Nl and N2 successfully amplified the contemplated product from all E. coli samples tested. 
These primers are selected from the E. coli genome and are covered by the claims. In contrast, 
EC-23 and EC-24, a pair of primers having different sequences selected from the same genome, 
failed to amplify the contemplated product from the samples. Clearly, contrary to the 
Examiner's assumption, not all members of the genus would function. Thus, the Examiner's 
conclusion is untenable. 

In view of the above remarks, Applicants submit that claim 1 is non-obviousness over the 
cited references. So are claims 2, 3, 5, 6, and 36-39, all of which depend from claim 1. 

Independent claim 8 is drawn to a nucleic acid obtained from amplification of an E. coli 
nucleic acid template with a pair of primers having, respectively, SEQ ID NO: 1 or 3 and SEQ 
ID NO: 2 or 4. Claims 9-14, depend from claim 8, further specify the sequences or lengths of the 
recited primers. Independent claim 15 is drawn to a nucleic acid selected from a group 
consisting of SEQ ID NOs: 5-8 and their complete complements. For the same reasons set forth 
above, claims 8-15 are also non-obviousness over the cited references. 

CONCLUSION 

Applicants submit that the grounds for the rejection asserted by the Examiner have been 
overcome, and that claims, as pending, define subject matter that is sufficiently described, 
definite, novel, and non-obvious. Thus, it is submitted that allowance of this application is 
proper, and early favorable action is solicited. 
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Enclosed please find a Petition for One Month Extension of Time with the required fee of 
$120. Please apply any other charges to deposit account 06-1050, referencing attorney docket 
12674-005001. 



Date: 



PTO Customer No. 26161 
Fish & Richardson P.C. 
225 Franklin Street 
Boston, MA 02110-2804 
Telephone: (617)542-5070 
Facsimile: (617)542-8906 



Respectfully submitted, 




Jianming Hao, Ph.D. 
Reg. No. 54,694 
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Abstract 

Escherichia coli 0157:H7 is a major food-borne infectious pathogen that causes diarrhea, hemorrhagic 
colitis, and hemolytic uremic syndrome. Here we report the complete chromosome sequence of an 0157:H7 
strain isolated from the Sakai outbreak, and the results of genomic comparison with a benign laboratory 
strain, K-12 MG1655. The chromosome is 5.5 Mb in size, 859 Kb larger than that of K-12. We identified a 
4.1-Mb sequence highly conserved between the two strains, which may represent the fundamental backbone 
of the E. coli chromosome. The remaining 1.4-Mb sequence comprises of 0157:H7-specific sequences, most 
of which are horizontally transferred foreign DNAs. The predominant roles of bacteriophages in the emer- 
gence of 0157:H7 is evident by the presence of 24 prophages and prophage-like elements that occupy more 
than half of the 0157:H7-specific sequences. The 0157:H7 chromosome encodes 1632 proteins and 20 tR- 
NAs that are not present in K-12. Among these, at least 131 proteins are assumed to have virulence-related 
functions. Genome- wide codon usage analysis suggested that the 0157:H7-specific tRNAs are involved in 
the efficient expression of the strain-specific genes. A complete set of the genes specific to 0157:H7 pre- 
sented here sheds new insight into the pathogenicity and the physiology of 0157:H7, and will open a way 
to fully understand the molecular mechanisms underlying the 0157:H7 infection. 
Key words: E. coli 0157:H7; genome sequence; E. coli K-12; bacterial pathogenicity; evolution 
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1. Introduction 

Since enterohemorrhagic Escherichia coli (EHEC) 
0157:H7 was first recognized as a gastrointestinal 
pathogen in 1982, 1 its occurrence has become a world- 
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wide public health problem, causing sporadic incidents as 
well as outbreaks of hemorrhagic colitis. Although other 
E. coli serotypes, such as 026:H11 and 0111:NM, share 
a similar pathogenic potential, most large outbreaks of 
EHEC infection have been caused by 0157:H7. 2 A promi- 
nent example is the huge outbreak which occurred in 
1996 in primary schools of Sakai City, Osaka prefec- 
ture, Japan, where more than 6000 schoolchildren were 
affected. 3 EHEC causes not only hemorrhagic colitis but 
also serious complications such as hemolytic uremic syn- 
drome (HUS), sometimes resulting in death. In the Sakai 
outbreak, approximately 1000 patients were hospitalized 
with severe gastrointestinal symptoms and about 100 vic- 
tims had complications of HUS, resulting in 3 deaths. 
Virulence determinants contributing to the EHEC infec- 
tion have been partly characterized, but the mechanism 
by which EHEC causes hemorrhagic colitis and HUS are 
not fully understood. 4 

We have determined the genome sequence of an 
0157:H7 strain isolated from the Sakai outbreak. The 
genome sequence of the benign laboratory strain K- 
12 MG1655 has already been determined, 5 but, to our 
knowledge, this strain (referred to as 0157 Sakai) is the 
first pathogenic E. coli strain whose genome has been 
fully sequenced. By comparing the two strains, we iden- 
tified all chromosomal components specific to each strain 
as well as those conserved in both strains. These results 
provided a broad array of whole genome-level information 
not only for obtaining a complete set of genes potentially 
related to the pathogenicity of 0157:H7 but also for un- 
derstanding the evolution of E. coli strains. 

2. Materials and Methods 

2.1. Bacterial strain 

EHEC 0157:H7 (RIMD 0509952) was isolated from a 
typical patient during the Sakai outbreak. This strain 
produces two Shigatoxins, Stxl and Stx2, and contains 
two plasmids, pO!57 and pOSAKl. The sequences of 
the plasmids, prophages encoding Stxl and Stx2, and 
the seven sets of rm operons were reported previously. 6 "* 9 
The physical map of the chromosome was also reported. 10 

2.2. Sequencing and assembly 

The initial stage of sequencing was done by the whole 
genome random shotgun method as described. 6-8 We 
constructed a pUC18-based library containing 1- to 2- 
Kb inserts, and sequenced 50, 156 clones using a forward- 
sequencing primer. After assembling the sequence data 
using phred/phrap/consed, 11,12 we selected two groups 
of clones: clones having inserts whose sequences started 
within 1.5 Kb from the ends of contigs and oriented out- 
side, and those having inserts whose opposite ends cov- 
ered the regions which have ambiguity in the sequence 
(lower than the evaluation value 20 by phred scoring). 



A total of 19, 969 clones that were selected according to 
these criteria were sequenced using the reverse primer. 
This strategy was quite effective in reducing the num- 
ber of random clones to be sequenced as well as in im- 
proving the sequence quality. We also constructed a 
lambda-based library with ca. 20-Kb inserts. We selected 
86 clones that contained the sequences non- homologous 
to the K-12 sequence at either end of the inserts, and de- 
termined the entire sequences of each insert by the ran- 
dom strategy. The obtained sequences were assembled 
into 111 contigs larger than 1 Kb in size. At this stage, 
we checked the sequence waves of all the regions that had 
low quality values by visual inspection, and all regions 
with any ambiguity (286 regions) were amplified by PCR 
and reanalyzed by direct sequencing of the PCR prod- 
ucts. Subsequently, we performed gap closing by PCR 
according to the physical map of the chromosome and 
the results of the systematic gene mapping. 10 The phys- 
ical map deduced from the whole chromosomal sequence 
determined in this study agreed with the experimentally 
determined map, guaranteeing the accuracy of the final 
assembly. 

2.3. ORF prediction, annotation, and sequence compar- 
ison with K-12 
We first defined all the 0157 Sakai-specific sequences 
larger than 19 bp by comparing the whole chromoso- 
mal sequence with that of K-12 MG1655 (Accession 
no. U00096) using the MUMmer program. 13 Then, the 
open reading frames (ORFs) in the strain-specific re- 
gions and those on the regions conserved in the two 
strains were identified and annotated separately using 
Genome Gambler version 1.41, 14 GLIMMER 2.01, 15 and 
BLAST. 16 In principal, ORFs larger than 150 bp were 
searched, but there are several exceptions. Conserved 
ORFs were annotated principally according to the de- 
scriptions for K-12 MG1655 5 and to "The E. coli In- 
dex" (http://web.bham.ac.uk/bcm4ght6/res.html), but 
eight small conserved ORFs were newly identified in this 
study. ORFs lying at the junctions of conserved regions 
and strain-specific regions were manually identified and 
annotated with a guide by BLAST results. tRNA genes 
were identified by tRNAscan-SE-1.12, 17 and other small 
RNAs were identified by BLAST. Paralogous gene fam- 
ilies were determined using BLAST under the criteria 
that at least 60% of query sequences were aligned with 
at least 30% identity. 

3. Results and Discussion 

3.1. Overview 

The complete sequence of the 0157 Sakai chromosome 
is 5,498,450 bp in length (Fig. 1). Since the strain con- 
tains a large virulence plasmid of 92,721 bp (p0157) 
and a cryptic plasmid of 3306 bp (pOSAKl), 6 the whole 
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Figure 1. Circular representation of the 0157 Sakai chromo- 
some. The outermost circle indicates the chromosomal location 
in base pairs (each tick is 100 Kb). The second and the third 
show predicted ORFs transcribed in the clockwise and coun- 
terclockwise directions, respectively. ORFs conserved in K-12 
are depicted in green and those not present in K-12 in red. The 
fourth circle shows the locations of ORFs on prophage genomes 
(Spl-18). The fifth circle shows the 20-Kb window- average of 
G + C percent in relation to the mean value of the chromosome. 
The locations of tRNA and rRNA genes are shown in the sixth 
and seventh circles, respectively. tRNAs conserved in K-12 are 
depicted in green, and those absent in K-12 are in red. 
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genome size is 5, 594, 477 bp, being the second largest 
bacterial genome sequenced so far. The 5.5-Mb chro- 
mosome encodes 5361 protein coding regions, 7 sets of 
rRNAs (16S, 23S, and 5S RNAs), 102 tRNAs, 1 tmRNA, 
and at least 13 small RNAs including RNase P, 6S RNA, 
and 4.5S RNA. Protein-coding regions occupy 88.1% of 
the chromosome and the average length of the ORFs is 
904 bp. The G + C content of the entire chromosome is 
50.5 mol% (Table 1). 

The chromosome length is 859 Kb larger than K-12 
MG1655. By comparing the two chromosome sequences, 
we identified an approximately 4.1-Mb sequence con- 
served in the two strains. There is no large rearrangement 
such as translocation or inversion in the conserved regions 
(Fig. 2a). The level of nucleotide sequence conservation 
in the 4.1-Mb sequence is remarkable; 98.31% identity 
with 2027 gaps. Since the two strains are known to be- 
long to distinct E. coli lineages, 18,19 the 4.1-Mb sequence 
probably represents the chromosome backbone conserved 
in most E. coli strains, though it may contain some seg- 
ments exceptionally common to the two strains but not 
to others. 

The backbone is, however, interrupted by numerous 
DNA segments of various sizes that are specific to each 
strain. These segments, that we call "strain-specific 
loops," are distributed throughout the backbone, but 



Figure 2. Chromosome comparison of 0157 Sakai and K-12 
MG1655. a: Dot plot representation of the nucleotide sequence 
conservation. Dots represent perfectly conserved sequences 
longer than 19 bp. b: Distribution of the strain-specific loops 
on the conserved backbone. The horizontal axis represents 
the backbone location, and the vertical bars the locations and 
the lengths of loops specific to 0157 Sakai (upwards) or K-12 
(downwards). Loops composed of prophages and prophage-like 
elements are depicted by solid and broken red bars, respectively. 
Phages and phage- like elements integrated into tRNA (or tm- 
RNA) genes and those carrying tRNA genes are indicated by 
blue triangles and green circles, respectively. 

in an uneven manner (Fig. 2b). In 0157 Sakai, larger 
loops were more frequently present in the regions sur- 
rounding the replication termination site (ter). Although 
replichores 1 and 2 are almost equal in length in K-12, 
replichore 1 in 0157 Sakai is 290 Kb longer than repli- 
chore 2, as has previously been predicted. 10 The impor- 
tance of the chromosome symmetry has been proposed 
in enteric bacteria, 20 but this level of asymmetry ap- 
pears to be permissible in E. coli. There are 296 strain- 
specific loops larger than 19 bp in 0157 Sakai (S-loops 
1-296) and 325 loops in K-12 (K-loops 1-325). Among 
these, 203 loops are located at analogous sites on the two 
chromosomes, but they are different sequences. These 
sites may represent "hot spots" for integration of foreign 
DNAs or for recombination. Another striking feature is 
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strain-specific * 
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others 


P0157 pOSAKI 




length of sequence (bp) 


5.498,450 


4,105,380 


1,393,070 


670,944 


722,126 


92,721 


3,306 


5,594,477 


G+C ratio (%) 


50.5 


51.1 


48.7 


50.3 


47.2 


47.6 


43.4 


50.4 


open reading frame (ORF) 


5,361 


3,729 


1,632 


887 


745 


83 


3 


5,447 


protein coding region (% of genome size) 


88.1 
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52.8 


88 


average ORF iength (bp) 
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rRNA(16S-23S-5S) 


7 


7 


0 


0 


0 


0 


0 


7 


tRNAandtmRNA 


103 


83 


20 


18 


2 


0 


0 


103 


non-classical RNA 


13 


10 


3 


0 


0 


0 


0 


13 



* Sum of the strain-specific loops longer than 19 bp. 
f The data was taken from Makino et al. 6 



that most of the large loops are prophages or prophage- 
like elements. In 0157 Sakai, 21 of 29 S-loops larger than 
10 kb are prophages or prophage-like elements, and the 
largest loop of 91.8 Kb in size (S-loopl08) is composed 
of two lambda-like phages integrated in tandem. 

The total length of S-loops is 1,393,071 bp, 25.3% 
of the chromosome. This corresponds to the whole 
genome size of a Lyme disease pathogen, Borrelia 
burgdorferi (1.44 Mb), 21 ' 22 and is more than twice that 
of Mycoplasma genitalium which has a minimal genome 
(0.58 Mb). 23 The average G + C content of the S-loops 
is 48,5 mol%, significantly lower than that for the con- 
served backbone (Table 1 and Fig. 1), suggesting that 
many of the loops are of foreign origin. The S-loops are 
categorized into two groups: loops apparently composed 
of prophages or phage remnants into one and the rest 
into another. The G + C content of the latter group 
is more atypical, while the phage loops have an average 
G + C content more similar to that of the backbone (Ta- 
ble 1). This is because most regions encoding the phage- 
essential genes are generally similar to the backbone in 
base composition. Regions on the phage genomes that 
are apparently non-essential for phage propagation often 
exhibit atypical base compositions. 

The comparative analysis of codon usage between the 
genes on the backbone and those on the S-loops also sug- 
gests the abundance of foreign genes on S-loops (Fig. 3) . 
Co dons that are used frequently in the backbone genes 
are less frequently used in the S-loop genes. Conversely, 
codons that are less frequently used in the backbone 
genes are more frequently used in the S-loop genes. This 
indicates the atypical codon usage in the S-loop genes. 

3.2. Mobile genetic elements 

A number of mobile genetic elements were identified 
on the 0157 Sakai chromosome; 20 kinds of insertion 
sequences (80 copies in total, but 45 are truncated or 
partially deleted copies) and 18 prophages or phage rem- 



nants (Tables 1 and 2 in the Supplement section and 
Fig. 1). Among the 20 kinds of IS elements, seven species 
are the ones newly identified in this study. 0157 Sakai 
and K-12 share eight types, but the major IS elements in 
each strain are completely different. The most abundant 
IS elements on the 0157 Sakai chromosome are IS 629 
(19 copies) and the ISr575-related elements (ISEc8, 682, 
and 683; 16 copies in total). 

Of the 18 prophages or phage remnants (Sakai 
prophages; Spl-18), 13 are lambda- like phages resem- 
bling each other. All these lambda-like phages, in- 
cluding Stxl- and Stx2-transducing phages that corre- 
sponded to Spl5 and Sp5, respectively, 7,8 contain vari- 
ous types of deletions and/or insertions of IS elements 
in the phage-essential regions, and thus are apparently 
defective. They, however, show surprisingly high sim- 
ilarities to each other even on the nucleotide sequence 
level (Table 2). It is unknown how these highly homol- 
ogous sequences were initially brought in and how they 
are maintained on a single chromosome, but recombina- 
tion between the prophages may be responsible for the 
generation of some chimeric phages which share identical 
or nearly identical segments. Prophages other than the 
lambda-like phages include a Mu-like phage, a P4-like 
phage, and the remnants of P2-like and P22-like phages. 
Taken together, about half of the 0157 Sakai-specific se- 
quences (48.2%) are of bacteriophage origin, suggesting 
the predominant roles of bacteriophages in the evolution 
of 0157:H7. These phages indeed carry not only the Stx 
genes but also various genes potentially related to the 
pathogenesis (see below). 

In addition to the 18 prophages, six chromoso- 
mal regions of 0157 Sakai exhibit prophage-like fea- 
tures (Sakai prophage-like elements, SpLEl-6), though 
they contain no genes with significant homology to 
known bacteriophage genes, except for those encod- 
ing integrase-like proteins (Table 2 in the Supplement 
section). SpLEl and 4 share some identities with 
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Figure 3. Codon usage analysis of the 0157 Sakai chromosomal genes. The genes were divided into four groups: those on the 
conserved backbone (conserved), on the prophages carrying tRNA genes (tRNA(+) phages), on the prophages not carrying tRNA 
genes (tRNA( — ) phages), on the other S-loops (other loops). Then the codon frequency was calculated for each group. Codons are 
put in the order of frequency in the genes on the conserved backbone. 



"CP4 cryptic prophages," prophage-like elements of K- 
12. 5:24 The former corresponds to S-loop72 (the sec- 
ond largest loop, 86.2 Kb in size), a part of which has 
been described as a "tellurite resistance- and adherence- 
conferring island." 25 The latter includes the LEE lo- 
cus. Both elements encode P4 integrase-like proteins and 
are integrated into tRNA genes accompanying phage at- 
tachment site-like (ait-like) sequence duplications. Fur- 
thermore, the two SpLEs share several homologous 
genes with CP4 prophages of K12; such as ECsl405 
(on SpLEl)/ECs4538 (on SpLE4)/yeeK (on CP4-44), 
ECsl406/ECs4539/i/eeC/, and ECsl407/ECs4540/yee 
suggesting that both SpLEs are CP4-like elements. 
SpLE5 encodes a phage integrase-like protein and is inte- 
grated into a tRNA gene (leuX) with a 26-bp duplication. 
SpLE3 and SpLE6 also encode P4 integrase-like proteins 
and are apparently integrated into tRNA genes, but their 
right end regions containing the attR sites have been lost 
probably by IS insertions and following genetic rearrange- 
ments. SpLE2 is a correspondent of CP4-44, but the 
internal portion is replaced by a different sequence. Al- 
though we do not have direct evidence to claim that these 
elements are really prophages, the above mentioned fea- 
tures suggest that they are at least phage-like mobile 
genetic elements. 

tRNA genes have been repeatedly reported as in- 
tegration sites for various genetic elements including 
bacteriophages. 20 ' 27 In 0157 Sakai, a total of 10 tRNA or 



tmRNA genes are used as integration sites for phages or 
phage-like elements. At two loci, two different phages are 
integrated in tandem. Each of these loci encodes a single 
tRNA (or tmRNA) gene except leuZ which is located the 
furthest downstream of the glyW-cysT-leuZ tRNA gene 
cluster. Thus, 36% of the 25 single-tRNA gene loci on 
the 0157 Sakai chromosome are occupied by phages or 
phage-like elements. In other words, of the 24 phages or 
phage-like elements, 11 use such single-tRNA gene loci 
as integration sites, demonstrating that single-tRNA loci 
are the most favored integration sites for these elements. 

K-12 also carries three lambda-like phages (DLP12, 
Rac, and Qin), a phage-like element (el4) and four CP4 
cryptic prophages (CP4-6, -44, -57, and an unnamed 
one) 5,24 Although the endpoints of the three lambda-like 
phages have not been exactly defined, we could deter- 
mine their possible endpoints by sequence comparison 
with 0157 Sakai (Table 3 in the Supplement section). It 
is noteworthy that Rac is integrated into the same site 
as that for SplO of 0157 Sakai, and that Rac and SplO 
share a ca. 21-Kb segment encompassing a region from a 
putative integrase gene, bl345 in MG1655 (ECsl929 in 
0157 Sakai), to the 5' part of ydaW (ECs 1946), but with 
several replacements of internal small segments. Qin and 
Spl2 also share a 3.8-Kb right end segment encoding the 
dicF RNA gene, dicB (ECs2284), ydfD (ECs2285), and 
ydfE (ECs2286), but the structures of the very ends dif- 
fer. The attfl-containing regions of both qin and Spl2 
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Table 3. Functions of strain-specific ORFs on the 0157 Sakai 

chromosome. 





Number 


Percent* 




Function assigned 


873 




Oo.O 




Metabolism 




60 


o. f 


(O.SJJ 


Transport 




56 


o c 
O.O 


(6.4) 


DJNA/KJNA processing 




22 


1.3 


(2.5) 


Regulation 




38 


2.3 


(A A\ 

(4.4) 


LPS synthesis 
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(1.9) 
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47 


2.9 




Virulence-relatedf 




84 


5.1 


(9.6) 






400 


24.5 


(45.8) 


IS-related 




114 


7.0 


(13.1) 


Unclassified 




35 


2.1 


(4.0) 


Conserved hypothetical 


384 




23.5 


Hypothetical^: 


375 




23.0 




Total 


1,632 




100 





* The numbers in parenthesis represent the percent of 
function- assigned ORFs. 

f Genes for type III secretion systems are included. 
X No database hit. 



appear to have been deleted. 

In addition to the four previously identified CP4 
prophages, we have newly identified two phage-like el- 
ements on the MG1655 chromosome (K-12 prophage-like 
element; KpLEl and 2) (Table 3 in the Supplement sec- 
tion). KpLEl is integrated into the argW tRNA gene 
with a 16-bp sequence duplication. KpLE2 is appar- 
ently integrated into the leuX tRNA gene, but the right 
end region has been deleted. Since K-12 had been orig- 
inally lysogenized by phage lambda, it contained a total 
of 11 prophage or phage- related elements, demonstrating 
again the predominant roles of bacteriophages in gener- 
ating the genetic diversity among E. coli strains. 

Another possible mobile genetic element is the rhs 
element. 28 0157 Sakai contains 9 rhs elements at 7 loci 
(Fig. 1 in the Supplement section). Four (rhs A, C, D, and 
E) are conserved in K-12 and two (rhsF and G) are the 
same as the elements identified in other E. coli strains, 29 
but the remaining three are the ones newly identified in 
0157 Sakai (designated rhsl, J, and K). The significance 
of diversity and the physiological functions of the rhs el- 
ements in E. coli strains remain to be elucidated. 

3.3. RNA genes 

0157 Sakai contains seven rrn operons (rrnA-H) and 
their locations and directions on the chromosome are 
the same as those in K-12 (Table 1), though some in- 
traspecific sequence diversities of rRNAs are present be- 
tween the two strains. 9 In contrast, the compositions of 
tRNA genes differ remarkably between the two strains 
(Table 1). Of the 102 tRNA genes of 0157 Sakai, 82 are 
conserved in K-12 but 20 are absent in K-12. Conversely, 



out of 86 tRNA genes identified in K-12, 4 are absent in 
0157 Sakai. Two leucine tRNA genes (leuPV), and one 
lysine tRNA gene (lysQ) in the leuQP V and lysYZQ loci 
are absent in 0157 Sakai. In the argX-hisR-leuT-proM 
locus, the leuT-proM portion is duplicated in 0157 Sakai 
(or deleted in K-12), yielding two 0157 Sakai-specific 
genes, but the first proM of 0157 Sakai has become a 
pseudogene by extensive base changes in the 3' region. 
Other 18 tRNA genes that are present only in 0157 Sakai 
reside on the genomes of 7 lambda-like phages, and are es- 
sentially the same as the UeZ-argN-argO genes identified 
in phage 933 W. 30 However, three genes corresponding 
to argN have undergone extensive base changes and have 
become pseudogenes. As a consequence, 0157 Sakai con- 
tains seven copies of ileZ (anticodon; CAU), four copies of 
argN (UCG), seven copies of argO (UCU). These three 
tRNA species have been proposed to recognize the He 
codon ATA (ileZ), a subset of the CGN family of Arg 
codons (argN), and a subset of the AGN family of Arg 
and Ser codons (argO). 30 They can, thus, recognize the 
five codons that are used most rarely in the genes on 
the conserved backbone, but used with a dramatically 
increased frequency in the genes on the S-loops; ATA, 
CGA, CGG, AGA, and AGG (Fig. 3). When the codon 
usage was compared between the genes on the phages 
carrying the tRNA genes, on the phages not carrying the 
genes, and on the other loops, there was essentially no 
difference. This suggests that these phage-encoded tR- 
NAs may be required not only for the efficient expression 
of the genes on the tRNA genes-bearing phages but also 
for that on other phages and loops. 

3.4- Protein- coding regions 

Among the 5361 ORFs identified on the 0157 Sakai 
chromosome, 3, 729 are conserved in K-12 (referred to as 
conserved ORFs) and the remaining 1632 are the ones not 
present in K-12 (specific ORFs) (Table 1). Table 3 is a 
simplified presentation of the functions of specific ORFs. 
The functions of the 873 specific ORFs are predicted 
by sequence similarity to known proteins and 369 are 
similar to proteins of unknown functions; the remaining 
ORFs are unique to 0157 Sakai. ORFs for phage-related 
and IS-related functions occupy large fractions of the 
function- assigned ORFs (45.7% and 13%, respectively), 
reflecting the abundance of these genetic elements on the 
chromosome. Genes with virulence-related functions, in- 
cluding those for fimbrial biosynthesis, also represent a 
large functional group (15%). Many genes involved in 
transport and metabolism were identified as well. 

Among the genes on the 0157 Sakai chromosome, a 
total of 2292 ORFs constitute 630 paralogous gene fam- 
ilies. Out of the 630 families, 345 consisted only of 
the conserved ORFs (906 ORFs) and 157 of the spe- 
cific ORFs (637 ORFs), whereas those consisting of both 
groups of ORFs comprised 128 families (445 conserved 
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ORFs and 304 specific ORFs). Thus, less than one- 
fifth of the specific ORFs have paralogs in the con- 
served ORFs, suggesting again that many of the spe- 
cific genes were acquired by horizontal gene transfer but 
not by duplication of preexisting genes. This notion 
may be further supported by the fact that only 392 spe- 
cific ORFs, including 46 transposases, have counterparts 
in COGs (Clusters of Orthologous Groups of proteins; 
http://www.ncbi.nlm.nih.gov/COG/xindex.html), 31 
making a sharp contrast with the finding that 3240 of 
the 3729 conserved ORFs (86.9%) have counterparts in 
COGs. 

3.5. Virulence-related genes 

Adhesion to the tissue surface is the first step for bac- 
teria to establish infection. Fimbria (also called pilus) is 
an important adhesion apparatus found in many Gram- 
negative bacteria. Some surface proteins are also used as 
afimbrial adhesins. On the 0157 Sakai chromosome, a 
total of 14 loci, each encoding a set of genes for fimbrial 
biosynthesis, were identified. Five loci are conserved in 
K-12 and four are unique to 0157 Sakai. The remain- 
ing five are partially conserved in K-12. Similar gene 
sets are present at the same loci in both strains but have 
undergone significant sequence changes and/or gene re- 
arrangements (Fig. 2 in the Supplement section). The 
newly identified loci include a locus encoding a gene set 
(ECs4426-4431) similar to that for Salmonella long polar 
fimbriae, 32 but no genes for the type IV pili are present in 
0157 Sakai. Besides the 14 fimbrial gene clusters, at least 
14 genes encode adhesin/invasin-like proteins, including 
two previously identified proteins: gamma-intimin on the 
LEE locus and the Iha adhesin. 25,33 At present, it is un- 
known under what circumstances these genes for fimbrial 
biosynthesis and adhesin-like proteins are expressed and 
whether they are actually involved in bacterial adherence. 
The presence of multiple sets of these genes, however, 
suggests that 0157 Sakai may bind/ adhere to a broader 
spectrum of tissues, cells, and molecules encountered in 
various environments than has been recognized. 

It should be also noted that 0157 Sakai encodes 
two proteins belonging to the TrcA chaperon-like pro- 
tein family, ECsl825 and ECs3485. In enteropathogenic 
E. coli (EPEC), TrcA is encoded in the LIM locus and 
required for the normal production of the bundle-forming 
pili and for the full development of the "localized adher- 
ence" phenotype. 34 ECsl825 and 3485 are encoded in loci 
similar or almost identical to LIM, and may participate 
in the adherence of 0157 Sakai as well. Both loci reside 
on the genomes of lambda-like phages, Sp9 and Spl7, 
respectively. 

A type III secretion system encoded by the LEE locus 
is responsible for the formation of attaching and effac- 
ing (A/E) lesion, an essential step for establishing the 
EHEC infection. 33,35 We have identified a second locus 



that encodes a new type III secretion system, designated 
ETT2 (E. coli type three secretion 2). Although the lo- 
cus resembles the inv/spa locus on the Salmonella SPI-1 
pathogenicity island, 30 ,37 it encodes only the components 
of a secretion apparatus but not the secreted effector pro- 
teins. It has, however, been shown that the Inv/Spa 
system translocates not only the proteins encoded on the 
SPI-1 locus but also those encoded outside the locus. 38,39 
Thus, it may be possible that effector proteins, which are 
translocated via the ETT2 machinery, are also encoded 
outside the locus. In this regard, it is noteworthy that 
0157 Sakai encodes several proteins with some similari- 
ties to known effector proteins, such as EspF (ECsll26 
and 2715). Since the cloned LEE element failed to confer 
the ability to secrete Esp effector proteins and express 
the A/E phenotype on K-12, 40 it may be also possible 
that ETT2 complements the function of the LEE locus. 
Cloned ETT2 actually shows an ability to secrete EspB 
in K-12 (Tobe, T. et al, unpublished data). 

0157:H7 strains are known to produce Stxs (Stxl 
and/or 2) and enterohemolysin. Stxs are deeply involved 
in the pathogenesis of HUS, one of the life-threatening 
complications of EHEC infection, though the role of en- 
terohemolysin has not been well elucidated. 41 In 0157 
Sakai, Stxl and Stx2 are encoded on the genomes of two 
lambda-like phages and the enterohemolysin on p0157, 
as are in other EHEC strains. 6-8 p0157 also encodes 
a protein resembling LCT toxins, 6 which are the ma- 
jor virulence factors in Clostridium difficile, a causative 
agent of severe antibiotics-associated colitis. 42 In addi- 
tion to these toxins, we have identified two genes en- 
coding toxin-like proteins, ECs0542 and ECsl283. The 
former encodes an extremely large protein of 5292 amino 
acids, which belongs to the RTX toxin family contain- 
ing repeated glycine-rich motifs (GGXXGXD) at the C- 
terminal domain. 43 It resides on S-loop42 together with 
four genes, three of which are probably responsible for 
its secretion (ECs0540, 0543, 0544). ECsl283 encodes a 
hemagglutinin/ hemolysin-like protein and is followed by 
a gene encoding a protein belonging to the hemolysin- 
secret ion/ activation protein family. 44 

Another unexpected finding is the presence of a large 
number of genes that may confer an increased capa- 
bility to survive in phagosome on 0157 Sakai. One 
lom-like gene (lomX), one copper /zinc-superoxide dismu- 
tase (SOD) gene (sodC) } and two catalase genes (katE 
and katG) are present in the conserved region. In ad- 
dition, the strain contains 11 lorn/ rck/ pagC-like genes, 
2 copper/zinc- SOD genes, and 2 catalase genes that are 
absent in K-12. They are all encoded on the genomes of 
lambda-like phages except for one catalase gene (katP) 
which is carried on p0157. Two copies of bor and one 
copy of traT that may be able to confer the serum resis- 
tance were also identified. These findings, together with 
the identification of the Inv/Spa-like type III secretion 
system, raise a challenging hypothesis that intracellular 
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and/or invasive phases may exist at some stage in the 
0157 infection. 

Iron transport systems are also important in bacte- 
rial pathogenesis, since access to this essential nutrient is 
severely limited in the host. E. coli strains possess mul- 
tiple iron transport systems. 45 Most genes for iron ac- 
quisition that have been identified in K-12 are conserved 
in 0157 Sakai, but the fee operon encoding a citrate- 
dependent iron transport system is missing. The strain, 
however, contains a set of genes that possibly encodes a 
different citrate-dependent transport system resembling 
that of Synecocystis spp. 46 Beside, the strain possesses 
two additional loci for iron acquisition: one is similar 
to the afu operon of Actinobacillus pleuropneumoniatf 1 
and the other to the shu operon of Shigella dysenteriae 
type l. 48 ' 49 Furthermore, another gene cluster on S- 
loop83 (ECsl693-1699) may also be involved in iron 
transport, since the cluster contains the genes for a TonB- 
dependent outer membrane receptor protein and the pro- 
teins with some similarities to the components of iron 
transport systems. Thus, the iron acquisition ability of 
0157 Sakai is apparently increased as compared with K- 
12. 

3.6. Transport and metabolism 

0157 Sakai contains a number of genes related to 
transport and metabolic functions that are not present 
in K-12. Many of the transport systems belong to 
the ABC transporter family or the phosphoenolpyru- 
vate: carbohydrate phosphotransferase (PTS) system; 25 
and 13 ORFs are for the components of ABC trans- 
porters and PTS systems, respectively. 0157 Sakai pos- 
sesses operons for transport and utilization of sucrose, 
urease, and sorbose, but the sorbose operon is disrupted 
by the insertion of a Mu-like phage. The strain also 
possesses a glutamate-fermentation system and two aro- 
matic acid degradation systems that are not present in 
K-12. One of the aromatic acid degradation systems is 
probably for salicylate degradation and the other is a 
non-oxidative decarboxylation system similar to the vdc 
system of Streptomyces sp. 50 Operons for a multidrug- 
efflux transport system and for tellurite resistance were 
also identified. The latter is probably responsible for a 
high-level resistance to tellurite, one of the phenotypes 
used for the laboratory isolation of 0157 strains. 51 The 
operon is similar to the terZ-F operon on incHI2 and 
incHII plasmids, 52 and is located on an 86.2-kb CP4-like 
element (SpLEl) where the urease operon, traT, and iha 
also reside. Although substrates were not predicted in 
other cases, most are apparently for transport and degra- 
dation of small molecules. Functional analyses of these 
strain-specific genes related to transport and metabolic 
functions are also important because they will provide 
a line of information essential to develop new selective 
media for more efficient isolation of 0157. 



Relatively fewer genes for biosynthetic pathways were 
identified in specific ORFs than those for degradation. 
The rfa and rfb loci encode the genes required for the 
synthesis of polysaccharide moieties of lipopolysaccha- 
ride: the R3 type core and the 0157-specific O antigen, 
respectively. 53,54 On S-loop71 and 225, we have identi- 
fied two loci that probably encode specialized or modi- 
fied systems for fatty acid biosynthesis. A locus on S- 
loop225 encodes a methyltransferase, an acyltransferase 
and a set of proteins required for elongation cycles in 
fatty acid synthesis: 55 two acyl carrier proteins (ACPs), 
a beta-hydroxyacyl-ACP dehydratase, a beta-ketoacy- 
ACP synthetase (KAS) II, a beta-hydroxydecanoyl-ACP 
dehydratase, a beta-ketoacyl-ACP reductase, and a pro- 
tein consisting of two fused KAS II molecules. A lo- 
cus on S-loop71 also encodes a holo ACP synthetase, a 
beta-ketoacyl-ACP reductase, a beta-hydroxyacyl-ACP 
dehydratase, an ACP, an aminomethyl transferase, and 
a KAS I. Both loci probably participate in the synthesis 
of fatty acid-containing molecules such as lipids, liposac- 
charides, or lipoproteins that are specifically produced by 
0157 Sakai but not by K-12. It is of interest that the sec- 
ond locus are located just downstream of the genes for the 
hemolysin-like and hemolysin-secretion/ activation pro- 
teins (ECsl283 and 1284), constituting an operon-like 
structure. It might be possible that the locus participates 
in acylation and activation of the hemolysin-like pro- 
tein as has been demonstrated for the alpha- hemolysin 
of uropathogenic E. coli. 56 

3.7. K-12 genes missing in 0157 Sakai 

As compared with K-12, 0157 Sakai lacks a total of 
567 ORFs, including the above mentioned fee operon and 
the genes for utilization of 2-phenylethylamine, xantho- 
sine, D-galactonate, L-idonate, glycolate, and short chain 
fatty acids. The inability to rapidly ferment D-sorbitol 
and the lack of beta-glucronidase (GUD) activity are the 
phenotypes that serve as markers for the laboratory iden- 
tification of 0157:H7. 57 ' 58 In typical E. coli strains, uti- 
lization of D-sorbitol occurs through a pathway initiated 
by a specific PTS system encoded by the gut operon. 59 In 
0157 Sakai, both the gut A and gutE genes encoding the 
sorbitol-specific PTS enzymes IIC and IIB have authen- 
tic frameshifts, and thus sorbitol transport is impaired. 
The slow sorbitol fermentation may be explained by the 
finding that the PTS system for D-mannitol can act on D- 
sorbitol at a low affinity. 59 The uidA gene encoding GUD 
is also disrupted in 0157 Sakai by a two-base insertion 
at nucleotide position 690. This mutation is different 
from that reported for the uidA gene of other 0157:H7 
strains. 60 

Genes for the general secretion pathway (GSP) identi- 
fied on the K-12 chromosome do not exist on the 0157 
Sakai chromosome, but a different set of GSP genes reside 
on pOI57 instead. 0 All the K-12 genes for DNA restric- 



20 



Complete Genome Sequence of E. coli 0157:H7 



[Vol. 8, 



tion and modification (hsdSMR, mrr, mcrA, and mcrBC) 
are not present in 0157 Sakai. In the place of hsdSMR 
and mrr, a type I system almost identical to the EcoA 
system was identified. Thus, the two strains have com- 
pletely different sets of DNA restriction/ modification sys- 
tems. 

Among the list of two-component regulatory systems 
identified in K-12, 61 only two systems are missing in 0157 
Sakai; those encoded by atoSC and ygiYX. The former 
is deleted together with other ato genes while ygiY is 
split into two parts by the introduction of a premature 
stop codon. This high level of conservation of the sig- 
nal transduction systems between the two strains implies 
that the expression of most strain-specific genes are under 
the control of global regulatory networks encoded on the 
conserved backbone, though many of the strain-specific 
genes are the ones horizontally acquired. This notion 
gains some support by the finding that most of the K-12 
genes involved in global regulatory functions, including 
all the sigma factors, are encoded on the backbone. 62 In- 
deed, 38 of the 0157 Sakai-specific regulators include no 
sigma factor, and we could identify only two systems as 
the 0157 Sakai-specific two component regulatory sys- 
tem (ECs0417/0418 and ECs5067/5074). 

3.8. Conclusions 

Genomic comparison of 0157 Sakai with K-12 provided 
a large amount of information with biological and med- 
ical importance. The presence of a well-conserved 4.1- 
Mb sequence that can be regarded as the chromosome 
backbone of E. coli and numerous strain-specific DNA 
segments of foreign origins ( "strain-specific loops" ) indi- 
cate how the two strains have diversified from a com- 
mon ancestral lineage. There is no doubt that bacterio- 
phages have played the predominant roles in this pro- 
cess. This mode of diversification probably represents 
a general pathway of the intraspecific evolution of most 
E. coli strains, though the final verification has to wait 
until the third or more genomes of the strains belong- 
ing to the other lineages are analyzed. Identification of 
a complete set of genes that are specifically present in 
0157 Sakai will shed new insights into the pathogenicity 
and the physiology of 0157, and open a way to fully un- 
derstand the molecular mechanisms underlying the 0157 
infection and to develop new strategies for prevention, 
treatment, and surveillance of the infection. 

Supplementary information is available in the Sup- 
plement section of this issue. It is also available at DNA 
Research Online [http://www.dna-res.kazusa.or.jp/] and 
on the authors' World-Wide Web site [http:/ /genome. 
gen-info.osaka-u.ac.jp/bacteria/ol57/]. The chromosome 
sequence has been deposited DDBJ/EMBL/GenBank 
under the accession numbers BA000007 and AP002550- 
AP002569. 
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